home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Almathera Ten Pack 2: CDPD 1
/
Almathera Ten on Ten - Disc 2: CDPD 1.iso
/
pd
/
051-075
/
073
/
lit
/
lit.man
< prev
next >
Wrap
Text File
|
1995-03-13
|
11KB
|
302 lines
LIT TEXT UTILITY MANUAL
Version 2.0, 11/19/86
Copyright (C) 1986 Donald J. Irving
Lit is a command line invoked text utility which filters a text file to
stdout printing printable characters as they are, and showing all
non-printable characters in any one or more of three representation
formats. The only character interpreted (acted upon) by lit is the line
feed character which causes lit to issue a line feed. The inspiration for
lit came from the "l" command in many of the UNIX line editors. Lit is not
quite the same as any of these, however. For one thing, lit output is
never ambiguous.
Here is an example of what lit does:
Say the file 'myfile' consists of the following ascii characters:
HT, HT, h, e, l, l, o, space, w, o, r, l, d, BEL, LF
Saying 'lit myfile' would produce the following output:
\t\thello world\007\n
And saying 'lit myfile [various options]' might produce any of:
\t\thello world^G\n
^I^Ihello world^G^J
\011\011hello world\007\012
\09\09hello world\07\0A
\009\009hello world\007\010
You control the output with optional command line arguments which provide:
1. The name of the file to read as input.
2. What subset of the file lines to print.
3. In which format(s) to represent non-printable characters.
4. Which number base to use for numeric representations.
If you do not supply these, they default (in the original version) to:
1. Stdin.
2. The whole file.
3. Backslash constructs if possible else numeric representations.
4. Octal.
Here is the command line template. The arguments may be specified in any
order. The -bcanohd options may be stacked after one minus sign, or they
may appear as separate arguments.
lit [<filename>] [-s<linenum>] [-p<numlines>] [-[bcan][ohd]]
THE NAME OF THE INPUT FILE
The first command line argument encountered which does not start with a
minus sign is considered to be the input file name. Any subsequent
command line argument which does not start with a minus sign is considered
to be an error. If no command line argument is found which does not start
with a minus sign lit uses <stdin> for input.
PRINTING A SUBSET OF LINES OF THE FILE
Lit prints the whole file by default. You can tell it on which line in the
file to start printing and/or how many lines to print by supplying either
of both of these command line arguments:
-s<linenum> lit will start printing at line <linenum>
-p<numlines> lit will print <numlines> lines
There is no space between the 's' or 'p' and the number. There is no
validity checking on the number values.
FORMATS FOR REPRESENTING NON-PRINTABLE CHARACTERS
There are three formats in which non-printable characters may be
represented: C Language style backslash representations such as \n,
control character representations such as ^J, and numeric value
representations such as \012.
C Language Backslash Representations
The form is a backslash followed by a lower case letter. Here is the list
of the applicable characters:
line feed \n
horizontal tab \t
backspace \b
carriage return \r
form feed \f
The ascii NUL character representation \0 is omitted. NUL is represented
by its control character representation or as a numeric value.
Control Character Representations
The form is a caret followed by another symbol, where the second symbol is
the keyboard control character of the character to be represented. For
example, the ascii line feed character is represented as ^J. The ascii
character DEL has an arbitrarily assigned representation of ^?.
ASCII Numeric Value Representations
The representation is in the form \num where num is the character's
numeric value. (the unsigned integer value of its eight bits) displayed in
any of the three number bases octal, decimal, or hexadecimal. For octal
representations, num is exactly three octal digits; for hex
representations, num is exactly two hexadecimal digits; and for decimal
representations, num is exactly three decimal digits. Num is zero-padded
on the left if necessary to make up the required number of digits. For
example, the ESC char is represented as \033, \027, or \1B in octal,
decimal, and hex respectively. NUL would be \000, \000, or \00. This
format is not limited to ascii characters; any eight bits can be
represented. Numbers of \200 (octal), \128 (decimal), \80 (hex), or
greater are byte values beyond the upper end of the ascii character set.
The largest byte value (all bits on) is \377 (octal), \255 (decimal), or
\FF (hex).
COMMAND LINE ARGUMENTS FOR SELECTING REPRESENTATION FORMATS
You tell lit which representation format or combination of formats to use
for non-printable characters by supplying one of the command line
arguments -b, -c, -a, or -n. If you supply none of these, then -b is
selected by default. If you supply more than one, then the latter
supersedes the former.
-b use backslash representations such as \n
if possible, else use numeric representations.
-c use control char representations such as ^J
if possible, else use numeric representations.
-a all; use backslash reps if possible, else use control
char reps if possible, else use numeric representations.
-n use numeric representations only.
You tell lit which number base to use for numeric representations by
providing one of the command line arguments -o, -h, or -d. If you supply
none of these, then -o is selected by default. If you supply more than
one, then the latter supersedes the former.
-o octal
-h hexadecimal
-d decimal
EXCEPTIONAL CHARACTERS
Two characters have special meaning in lit output. The backslash character
\ always has special meaning. The caret character ^ has special meaning
whenever control character representations are enabled.
The Backslash Character \
As already described, the \ character in lit output signals the beginning
of either a special letter representation such as \n or a numeric
representation such as \012. The \ is also used to relieve a subsequent \
or ^ of its special meaning. \\ represents the actual character \, and
(when control character representations are enabled) \^ represents the
actual character ^.
The Caret Character ^
When control character representations are enabled, a ^ signals the
beginning of a control character representation such as ^J. Note the
implication therefore that ^^ means Control caret (ascii RS), and ^\ means
Control backslash (ascii FS). In both of these cases the second character
is relieved of its special meaning because it is part of the control
character representation. If control character representations are not
enabled, then ^ is just another printable character.
CONCLUSION
Lit fills the gap between text editors which usually interpret special
characters in special ways, and hex dump utilities which make terrible
reading for text files. One of lit's greatest strengths is that it
interprets nothing but the linefeed character; everything else is just
represented to the output stream.
Although lit provides a variety of output formats, perhaps its main
usefulness is in quickly locating U.F.O.s (Unidentified File Objects) that
have gotten into your text files. (like that ESC char that's wierding out
your printer) For this purpose, the default options are adequate, and, for
C programmers at least, already familiar.
Donald J. Irving
9812 Gardenwood Way
Sacramento, CA 95827
(916) 366-3225
CIS: 73547,1335
PLINK: ops158
Post scripts:
**
One convenient way of getting to know lit is to use the default input file
stdin. Just say 'lit [-options]' with no file name. Now you can type in
lines one at a time and have lit filter them back to you. Try typing
control characters to see how they come back. Keep in mind that in this
configuration, the CLI is still trapping and interpreting (acting upon)
what you type, so screen control characters like form feed, and tab, for
example, actually cause form feeds and tabs to occur on the screen before
lit has a chance to send you its output. This may make the screen look a
little messy, but at least if the CLI is interpreting everything it can
tell when you type Control C to break out.
**
Want to have lit give you a Usage statement? Say 'lit lskdmlsdm' where
lskdmlsdm is any string of garbage which doesn't add up to the name of a
real file.
**
Why not use \0 to represent NUL? Consider the following character
sequence:
BEL, space, NUL, 0, 7
Using \0 for NUL would yield the output '\007 \007'. To avoid this
ambiguity, the \0 construct is not included in the backslash
representations.
**
Why use ^? for DEL? Keyboard control characters are always 64 places
higher in the ascii table than the non-printable characters they
represent. DEL is at the high end of the ascii character set, however, so
there's no keyboard character to represent it. We need to arbitrarily
choose some character. The ? seems to make at least some sense as a
choice; it is 64 places less than DEL, and that kind of satisfies ones
desire for symmetry in the world. (Besides, some of the UNIX world tools
already do it that way.)
**
If you don't like the default option settings, they are very simple to
change in the C source. If you don't have a C compiler, and can't live
with the settings, I will be willing to recompile it with your desired
option settings. Send me a disk in a protective mailer and include return
postage. I will return your disk in the same mailer.